An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
نویسندگان
چکیده
The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.
منابع مشابه
Theory and Methodology An Outlier Detection Methodology with Consideration for an Inefficient Frontier
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results, particularly for two-stage analyses. In the DEA-related literature, prior work on this issue has focused on the efficient frontier as a basis for detecting outliers. An iterative approach to deal with the potential for one outlie...
متن کاملSimultaneous robust estimation of multi-response surfaces in the presence of outliers
A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...
متن کاملOutlier detection in two-stage semiparametric DEA models
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results, particularly for two-stage analyses. In the DEA-related literature, prior work on this issue has focused on the efficient frontier as a basis for detecting outliers. An iterative approach for dealing with the potential for one ou...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2015